<hl>Note: This is an appendix to the "Parse-O-Matic Scripts" user manual.</hl>
<h2>Table of Contents</h2>
You can click on any section title below to jump directly to that section.
<a href="#INTRODUC">Introduction</a>
<a href="#TYPESOFC">Types of Comparators</a>
<a href="#LITERALC">Literal Comparators</a>
<a href="#LITEXAMP">Examples</a>
<a href="#LITCOMSO">Literal Comparisons and Sort Order</a>
<a href="#NUMERICC">Numerical Comparators</a>
<a href="#NUMEEXAM">Examples</a>
<a href="#NUMERISO">Numeric Comparisons and Sort Order</a>
<a href="#LENGTHCO">Length Comparators</a>
<a href="#MATCHESC">Comparing Patterns</a>
<a name="INTRODUC"><h2>Introduction</h2>
A 'comparator' is a parameter used in scripting commands which compares one value to another. For example:
<hl>
If AreaCode = '416' Output 'Toronto'
</hl>
In this example, a comparison is being made between the variable named AreaCode and the literal '416'. The equals sign is the 'comparator'.
Now consider this command:
<hl>
If AreaCode = '514' Region = 'Montreal'
</hl>
In this case, the first equals sign is a comparator because it is comparing two values. The second equal sign is <b>not</b> a comparator; it is actually the 'Equals' command, which assigns a value to a variable.
<a name="TYPESOFC"><h2>Types of Comparators</h2>
Parse-O-Matic Scripting supports several types of comparators:
Note # 1: Depends on sort order. For a discussion of what this means, refer
to <a href="#LITCOMSO">Literal Comparisons and Sort Order</a>.
Note # 2: The two values are considered basically the same if they
contain the same text, regardless of upper or lower case, and
any surrounding whitespace. Thus ' CHESHIRE CAT ' is the
considered the same as 'Chesire Cat'.
<a name="LITEXAMP"><h3>Examples</h3>
With some restrictions (discussed later), literal comparators work on both
numeric and alphabetic data. Here are some examples of literal comparisons
that are 'true':
<hl>
'ABC' <> 'ABCD' '333' <> '444'
'ABC' <= 'ABCD' '333' <= '444'
'ABC' < 'ABCD' '333' < '444'
'ABC' Shorter 'ABCD' '333' SameLen '444'
'ABC' >= 'ABC' 'ABC' <> 'CDE'
'ABC' <= 'ABC' 'ABC' <= 'CDE'
'ABC' = 'ABC' 'ABC' < 'CDE'
'ABC' SameLen 'ABC' 'ABC' SameLen 'CDE'
'ABC' ^ 'AB' 'ABC' ~ 'CD'
'ABC' ^ 'ABC' 'ABC' ~ 'CC'
</hl>
Note especially the ^ (contains) and ~ (does not contain) comparators. These are extremely useful when analyzing data.
<a name="LITCOMSO"><h3>Literal Comparisons and Sort Order</h3>
Some of the literal comparators compare text according to 'PC-ASCII sort order'. For plain English text, this works fine. However, if your text contains diacritical (accented) characters, you should be aware that some comparisons will not work correctly. For example, the 'A-Umlaut' character appears in the PC-ASCII character set <b>after</b> the PC-ASCII value for 'Z'.
<a name="NUMERICC"><h2>Numerical Comparators</h2>
Here is a list of the numerical comparators:
<hl>
ùùùùùùùùùù ùùùùùùùùùùùùùùùùùùù
Comparator Meaning
ùùùùùùùùùù ùùùùùùùùùùùùùùùùùùù
#= Equal
#<> Not equal
#> Greater
#>= Greater, or equal
#< Less than
#<= Less than, or equal
ùùùùùùùùùù ùùùùùùùùùùùùùùùùùùù
</hl>
Numerical comparators avoid the problem of sort order. For a discussion of this, see <a href="#NUMERISO">Numeric Comparisons and Sort Order</a>.
<a name="NUMEEXAM"><h3>Examples</h3>
Here are some examples of numeric comparisons (encoded variously with and without surrounding quotes) that are 'true':
<hl>
345 #<> 567 '1.23' #<> '9.87'
345 #<= 567 '1.23' #<= '9.87'
567 #> 345 9.87 #> '1.23'
'3' #< '6.2'
</hl>
The last example compares an integer ('3') with a real number ('6.2'). The numeric comparators automatically check if one of the numbers contains a decimal point.
In such case, the comparison is performed in 'real number' mode, which imposes the same accuracy restrictions as those imposed by the CalcReal command. This might create a problem if you are comparing a decimal number with a large integer, but this is rarely a cause for worry, since most data analysis tends to compare similar types of numbers.
<a name="NUMERISO"><h3>Numeric Comparisons and Sort Order</h3>
You can get unintended results when you use literal comparators on numbers. For example, this does not work as you might expect at first glance:
<hl>
count = count+
If count >= 2 OutEnd count
</hl>
You might expect this to output any number greater than or equal to '2', but in fact you will get a different result, because the comparison is a literal (text) comparison. In the example above, '2' to '9' are greater or equal to '2', but '10' (which starts with '1') is considered <b>less</b>, as is evident when you sort several numbers alphabetically:
<hl>
1
10
11
15
100
2
20
200
3
30
</hl>
As you can see, the values 1, 10, 11 and 15 come before '2' when sorted
alphabetically.
To compare numbers, you should use the numerical comparators. The correct
way to code the previous example is as follows:
<hl>
count = count+
If count #>= 2 OutEnd count
</hl>
Written in this way, numbers greater than or equal to two will be sent to the output file.
<a name="LENGTHCO"><h2>Length Comparators</h2>
Here is a list of the length comparators:
<hl>
ùùùùùùùùùù ùùùùùùùùùùùùùùùùùùù
Comparator Meaning
ùùùùùùùùùù ùùùùùùùùùùùùùùùùùùù
Len= Equal
Len<> Not equal
Len> Greater
Len>= Greater, or equal
Len< Less than
Len<= Less than, or equal
ùùùùùùùùùù ùùùùùùùùùùùùùùùùùùù
</hl>
The length of the value on the left side of the comparator is compared with
a <b>number</b> on the right side of the comparator. For example:
<hl>
If $Data Len= 0 NullLine = 'Yes'
</hl>
Of course, you could accomplish the same thing with this command:
<hl>
If $Data = '' NullLine = 'Yes'
</hl>
However, in most cases the length comparisons will save you some coding because you will not have to use the Len command to obtain a variable for comparison.
<a name="MATCHESC"><h2>Comparing Patterns</h2>
The Matches comparator compares a value against a pattern that uses "regular expression" syntax (explained later). For example:
<hl>
If MyVar Matches 'c[aou]t' GotMatch = 'Yes'
</hl>
This will set the variable GotMatch to 'Yes' if MyVar contains 'cat', 'cot' or 'cut' (case is ignored).
The pattern uses "regular expression" syntax and must be the second item in the comparison.
In order for the comparison to be "true", the item being compared to the pattern must match the pattern precisely ù the Matches comparator does not look for substrings.
If you want to allow a substring to match, use the Comprises comparator. For example:
<hl>
If MyVar Comprises 'c[ao]t' GotMatch = 'Yes'
</hl>
This will set GotMatch to 'Yes' if MyVar includes either the word 'cat' or 'cot'. Thus, the strings 'He had a cat' and 'He had a cot' both Comprise the pattern, as do the strings 'cat', 'cot', 'Cat', 'scatter' and so on.
Click <a href="pommel://Help-RegExp.txt">here</a> to learn about regular expression syntax.